Phase 1 implementation of InstanceState service (no GMS dependency)
Implementation Approach
When DAS starts, the .instancestate file is read and all instances present in that domain will be placed in the last recorded state
This is to avoid pinging the instances during startup which will add to the start time
When a new instance is created, it will be put in the NEVER_STARTED state (Id = 1)
The state change for an instance happens only when a command targeted at the instance comes and the framework tries to replicate the command on the instance
I am differentiating between list-instances command from other commands (a separate event 4) basically because an instance can go from RUNNING to NO_RESPONSE state only if list-instances command fails on an instance.
When an instance is started and it syncs up with DAS, at the end of _synchronize-file commands, the DAS will reset the instance from RESTART_REQD to NO_RESPONSE state and a subsequent command on that target will move it to RUNNING state (assuming that command succeeds). This is the only way an instance can move out of RESTART_REQD state
There is no NOT_RUNNING state because a failure of a ping to instance does not always mean that the instance is down - hence the NO_RESPONSE state
There is no STARTING* state for an instance at all (and hence no queueing of commands) because
a hidden command replacement for JOIN_AND_READY event can fail
an instance may start with --nosync option in which case it has to send some different hidden command once it is ready which can fail
Notes :
Recovering from the above mentioned failure scenarios when new hidden commands (meant for informing DAS about instance state changes) is definitely possible by starting a timer on a failure, retrying ping for X number of times etc. But does it make sense for us to spend the time and effort to do this when the customer can easily opt for GMS and we can provide a better solution as described here ?